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The study of the Gaussianity of the cosmic microwave background (CMB) 
radiation is a key topic to understand the process of structure formation in 
the Universe. In this paper, we review a very useful tool to perform this type 
of analysis, the Rayner & Best smooth tests of goodness of fit. We describe 
how the method has been adapted for its application to imaging and inter- 
ferometric observations of the CMB and comment on some recent and future 
applications of this technique to CMB data. 



1 Introduction 

The study of the Gaussianity of the cosmic microwave background (CMB) 
fluctuations has become a very useful tool in constraining theories of struc- 
ture formation. The standard inflationary scenario predicts Gaussian fluctu- 
ations whereas other competitive theories would imprint non-Gaussian signa- 
tures on the CMB (see [5] for a review). Therefore, the study of the Gaus- 
sianity of the CMB can help to discard or constrain some of these theories. 
Moreover, secondary effects (e.g. gravitational lensing, Rees-Sciama effect, 
Sunyaev-Zeldovich effect...), astrophysical emissions and systematics may as 
well leave non-Gaussian imprints on the CMB, which should not be confused 
with intrinsic non-Gaussianity. 

Given the importance of this type of analysis and taking into account 
that different methods may be sensitive to different kinds of non-Gaussianity, 
many tools have been developed for the study of the temperature distribution 
of the CMB. Among others, they include the Minkowski functional [24], the 
bispectrum [18], wavelet techniques [4], geometrical estimators [27] or smooth 
tests of goodness of fit [3] . 

The interest for this type of analysis has increased even more since the 
release of the WMAP data [7]. A large number of different techniques have 
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been applied to study whether these data follow or not a homogeneous and 
isotropic Gaussian random field, finding in some cases unexpected results. 
In particular, a significant number of works have reported deviations from 
Gaussianity and/or isotropy, whose origin is uncertain (e.g. [35, 17, 20, 12, 
13, 15, 26, 37], see also [25] for a review). 

In this paper, we review the Rayner and Best smooth tests of goodness of 
fit for the study of the Gaussianity of the CMB. In section 2 we describe the 
test and how to adapt the method for its application to CMB observations. A 
discussion about current and future applications to different CMB datasets is 
given in section 3. Finally our conclusions are summarised in section 4. 



2 The Rayner and Best smooth tests of goodness of fit 

Given a statistical variable X and n independent realizations Xi, i = 1, ...,n, 
we want to test if X follows a given probability density function (pdf) f(x). 
The smooth tests of goodness of fit (gof) allows one to discriminate between 
a predetermined pdf f(x) (null hypothesis) and a second one that deviates 
smoothly from the former (alternative hypothesis). 

Among the possible forms for the alternative pdf, Rayner & Best [28, 29] 
consider: 



/ fc (M) = C(0)exp 
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where 6 — (6i, 6k) is a set of k parameters that allows for smooth deviations 
of the alternative hypothesis with respect to f(x), C{9) is a normalisation 
constant that ensures that fk is normalised to 1 and hi form a complete set 
of orthonormal functions of /. Note that for 9 = we recover f(x), therefore, 
our statistical analysis consists on testing the null hypothesis Ho : {9 = 0} 
versus the alternative hypothesis Hi : {9 ^ 0}. 

To perform this analysis, the score statistic is used. This is a quantity 
which is closely related to the likelihood ratio (see e.g. [28]). For the Rayner 
& Best smooth tests of gof, the score statistic associated to the k alternative 
is given by 

S k = J2 U ? ( 2 ) 

i=l 
n 

with tfi = ^ £>(*j) (3) 

Large values of Sk (or of Uf) reject the null hypothesis. 

In the case of testing if our data follow a Gaussian distribution of zero mean 
and unit dispersion, the hi are given by the (normalised) Hermite Chebishev 
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If the Gaussian hypothesis holds, the Uf follow a xi distribution when n —* oo. 
This allows one to determine easily the significance of any possible deviation 
from Gaussianity by comparing the value of the Uf of the data with a xl- 

We must point out that the proposed technique is designed to test if the 
data follow a univariate Gaussian. Thus, for optimality, it should be applied 
to independent data. However, the CMB signal is correlated at all scales and 
the noise may as well present correlations. Therefore, before applying the gof 
test, it is necessary to transform the data to make them as independent as 
possible. 

One possibility is to obtain the Cholesky decomposition of the correlation 
matrix of the data (including signal plus noise) C = LL l and then multiply the 
xi by the inverse of the Cholesky matrix, i.e. j/j = J^j ^ij x r The constructed 
yi are uncorrelated, have zero mean and unit dispersion. Moreover if the data 
arc Gaussian, they also follow a normal distribution and are independent. 
This decorrelation technique has been used for analysing the MAXIMA data 
with different smooth tests of gof [11, 1, 2]. Nevertheless, the preprocessing 
of the data has been improved in subsequent works through the use of a 
signal-to-noise decomposition, which is explained in the next subsection. 

2.1 Signal-to-noise decomposition 

The signal to noise decomposition was introduced in the CMB field by [10], 
whereas [3] applied this formalism jointly with the gof test. This technique 
allows one to construct uncorrelated eigenmodes from the data which are also 
associated to a certain signal-to-noise ratio. 

Let us consider a set of CMB data d i} i = 1, ...,n, where i corresponds to 
a given position in the sky. This can be written as 



where Si and rii are the contributions from the CMB signal and noise, respec- 
tively. The mean values of signal and noise are assumed to be zero and their 
correlation matrices are given by SV, = (siSj) and Nij = {riirij) where the 
brackets indicate average over many realizations. 
The signal-to-noise eigenmodes are defined as 

3 The form of the hi for other usual distributions (e.g. uniform, exponential) can 
be found in [28]. 

4 The moment of order k of the data is defined as fi k — Y7j=i Vj l n 
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£ = R^d (6) 

where Ln is the Cholesky matrix of N, i.e. N — LnL 1 n , and R is the rotation 
matrix that diagonalizes the matrix A = L^SL^. The eigenvalues of this 
diagonalization are denoted by Ei. Let us now construct the quantities yc 

(7) 

It can be shown that these quantities are uncorrelated and have zero mean 
and unit dispersion. Moreover, if the data d are multinormal, then the yi are 
distributed according to a Gaussian pdf, since all the applied transformations 
are linear. In this case the yi are also independent. Therefore we are in the 
optimal conditions to apply the gof tests to the quantities yi. 

In addition, we also have information about the signal-to-noise ratio of the 
i eigenmode, which is given by \/E~i. This means that eigenmodes with low 
values of Ei are dominated by noise and may be discarded from the analysis. 
Therefore, in practice, the gof test will be applied to the subset of yi such that 
its signal-to-noise ratio is greater than a given threshold, i.e. Ei > E cut . Thus, 
this decomposition allows us not only to obtain uncorrelated variables but also 
to select the fraction of the data where the signal contribution dominates over 
the noise. 



2.2 Application to interferometer observations 

The previous technique has been adapted to deal with interferometric data 
by [3] and applied to VSA data in [30]. 

Let us consider an interferometer observing a small region of the sky at 
frequency v, for which the flat-sky approximation is valid. In this case the 
complex visibility, which is the response of the interferometer at the considered 
frequency, is given by 

V(u, v) = J P(x, is)B(x 7 v) exp(i2irux)dx (8) 

where x corresponds to the angular position of the observed point on the 
sky and u is the baseline vector in units of the wavelength of the observed 
radiation. P(x 1 v) is the primary beam of the antennas (normalized to unity 
at its peak) and B(x, v) corresponds to the brightness distribution on the sky. 

Of course, for a realistic instrument, the effect of instrumental noise should 
be also taken into account. Therefore, the ith baseline Ui of the interferometer 
will measure 

d(ui,v) = V{ui,v) +n(ui,v) (9) 

where n(t*i, v) corresponds to the instrumental noise of the Ui visibility. 

Let be N the total number of complex visibilities observed by the interfer- 
ometer. Since the measured quantities are complex, the number of elements 
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that constitute the data are N d = 2N, corresponding to the real and imagi- 
nary parts of each observed visibility. 

Testing the Gaussianity of the measured visibilities is equivalent to testing 
the joint Gaussianity of their real and imaginary parts. Therefore the signal-to- 
noise decomposition can be applied directly to these quantities (so we will have 
a total of Nd eigenmodes). The correlation matrix S of the real and imaginary 
parts of V(ui, v) (i.e. the correlation matrix of the signal) can be computed 
following the work of [21] whereas the noise correlation matrix is determined 
by the characteristics of the instrument. Once the signal-to-noise eigenmodes 
have been obtained, the gof technique can be applied to test the Gaussianity of 
these quantities (or of a subset of them with the highest signal-to-noise ratio) . 
As in the previous case, if the data are distributed as a multinormal, the 
constructed eigenmodes are independent and follow a Gaussian distribution 
of zero mean and unit dispersion. 

A complementary analysis can also be performed on the phases of the 
decorrelated visibilities. If the data are Gaussian, the phases should follow 
a uniform distribution. This can be tested using the Rayner & Best smooth 
tests of gof by considering the appropriate hi in equation (2) (see [28, 3] for 
details). However, [3] found that, for their considered examples, the phase 
analysis was less sensitive to deviations from Gaussianity than the test based 
on the real and imaginary parts of the visibilities. 

2.3 Some comments about the method 

One of the advantages of the Gaussianity analysis based on the gof test and 
the signal-to-noise formalism is that it is well suited for the study of many 
different kinds of CMB observations. In particular, it can be adapted to deal 
with most of the problematics found in real data. For instance, it is not af- 
fected by the presence of holes in the data or by the use of irregular masks 
and it can easily deal with anisotropic and/or correlated noise. Also, as al- 
ready explained, it can be applied to imaging or interferometric data. Another 
interesting feature of the method is that it allows one to choose that fraction 
of the data with a signal-to-noise ratio above a certain threshold. In addition, 
as will be discussed in the next section, it is a very sensitive technique, being 
able to detect different type of deviations in the data (such as intrinsic non- 
Gaussianity, systematic effects or anisotropy of the local power spectrum). 

The main shortcoming of the technique is the large amount of CPU re- 
quired to calculate the signal-to-noise eigenmodes, since it involves the diago- 
nalization of large matrices (of size nxn, where n is the number of data to be 
analysed). However, the method uses only a fraction of the eigenmodes (those 
whose signal-to-noise ratio is higher than a given threshold) and therefore it 
is not necessary to obtain all the eigenmodes and eigenvalues of the problem. 
To take advantage of this fact, [30] proposes the use of the Arnoldi algorithm 
which significantly speeds the calculation of the required yi. This method is 
based on the construction of a matrix H of dimension m x m (with m < n) 
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such that it is possible to construct a good approximation to certain eigen- 
vectors and eigenvalues of A from those of H . In particular, the eigenvectors 
that are well approximated correspond to those with higher eigenvalues. From 
these quantities it is also possible to construct those eigenmodes with higher 
signal-to-noise ratio, i.e., those that are kept for the analysis (see [30, 31] for 
details). This means that we have significantly reduced the computational cost 
of the analysis, since we are working with a matrix of size m x m instead of 
n x n. 



3 Applications to CMB data 

The gof tests were firstly introduced in the CMB field by [11], which carried 
out a Gaussianity analysis of the MAXIMA data [19]. The results showed that 
the data were compatible with Gaussianity (see also [1, 2]). 

A more recent application of the Rayner & Best gof test has been car- 
ried out by [30] , that present a Gaussianity analysis of the Very Small Array 
(VSA) data [34, 23, 16]. The VSA is an interferometer sited at the Teide Ob- 
servatory (Tenerife) designed to observe the sky on scales going from 2° to 10' 
and operates at frequencies between 26 and 36 GHz (see [36] for a detailed 
description). 

In the analysis, most of the fields observed by the VSA were found to 
be compatible with Gaussianity. However, deviations from Gaussianity were 
detected in the J7| statistic in three cases. After a thorough analysis of the 
possible origins of these detections, the authors concluded that one of the de- 
viations was associated to a residual systematic effect of a few visibility points, 
which, when corrected, have a negligible effect on the angular power spectrum. 
A second detection seemed to have its origin in a deviation of the local power 
spectrum of the considered field with respect to the power spectrum estimated 
from the complete dataset. This deviation was found at angular scales around 
the third angular peak (£ = 700 — 900). If the affected visibilities were re- 
moved, a cosmological analysis based only on this modified power spectrum 
and the COBE data showed no differences except for the physical baryon 
density, which decreased by 10 per cent and got closer to the value obtained 
from Big Bang Nucleosynthesis. Finally, the third deviation from Gaussianity 
was found in observations of the Corona Borealis supercluster region [22]. In 
this case, the non-Gaussianity was identified as intrinsic to the data, prob- 
ably due, at least in part, to the presence of Sunyaev-Zeldovich emission in 
the region. This result has been later confirmed with the measurements of the 
MITO telescope in this region [6]. A combined maximum likelihood analysis 
of the MITO and the VSA data provided a weak detection of a faint signal 
compatible with a SZ effect, characterized by a Comptonization parameter of 
y = (7.8±2;|) x 10~ 6 , at 68% CL. 

An application of the gof technique to the Archeops data is currently 
ongoing [14]. Archeops is a balloon-borne experiment, which is dedicated to 
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measure the CMB temperature anisotropics from large to small angular scales 
[8, 9]. It has also been designed as a test bed for the forthcoming Planck high 
frequency instrument. The preliminary results show the good performance of 
the method, that is able to deal with the presence of anisotropic and correlated 
noise in the data. 

The application of the gof technique to the WMAP data [7] is of great in- 
terest and is currently in progress. Due to the large amount of data observed 
by this experiment, a whole sky analysis at full resolution is unfeasible, due to 
the large computational resources required for the signal-to-noise decomposi- 
tion. However, two types of complementary tests are possible: an analysis of 
the full-sky at low-resolution and a study of small regions of the sky at high 
resolution. Given the sensitivity of the gof tests to detect deviations from a 
homogeneous and isotropic Gaussian random field, this analysis could shed 
new light on some of the anomalies reported for the WMAP data. 

4 Conclusions 

We have reviewed the Rayner & Best smooth tests of goodness of fit and its 
applications to CMB data. One of the most interesting features of this method 
is that it can deal with most of the problematics found in real data such as 
the use of irregular masks or the presence of anisotropic and/or correlated 
noise. In addition, it has been adapted to deal either with imaging or intcr- 
ferometric observations. The main shortcoming of the technique is the large 
computational cost required to perform the signal-to-noise decomposition of 
the data. However, this problem can be significantly alleviated by the use of 
approximate methods such as the Arnoldi algorithm. 

The recent and current applications of the gof tests to different datasets 
are showing its good performance. Most notably, the method has been able 
to detect deviations from a homogeneous and isotropic Gaussian field in the 
VSA data, which were associated to very different origins: residual systemat- 
ics, a deviation of the local power spectrum with respect to the global one and 
non-Gaussianity intrinsic to the data. It is important to mention that Gaus- 
sianity analyses had already been performed in the VSA dataset using other 
methods [32, 33] but neither the residual systematics nor this small deviation 
of the power spectrum were detected. Therefore we believe that this method 
constitutes a very useful tool for the statistical analysis of CMB data. 
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